Segmental optical phonetics for human and machine speech processing
نویسنده
چکیده
That talkers produce optical as well as acoustic speech signals, and that perceivers process both types of signals has become well known. Although perceptual effects due to audiovisual speech integration have been a focus of research involving the visual speech stimulus, relatively little is known about visual-only speech perception and optical phonetic signals. This knowledge is needed to exploit optical signals for applications such as synthetic artificial talking heads and audiovisual ASR. One important practical concern is the wide variation in performance among individual visual perceivers and talkers. This paper focuses on variation in visual phonetic perception, phoneme distinctiveness and word recognition. The paper also introduces a project linking optical phonetics, speech kinematics, and perception.
منابع مشابه
Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملSpeech Synthesis
Speech Synthesis is undoubtedly a technological challenge with many potential applications in human-machine communication. More basically, it is a crossroads where researchers with many different backgrounds collaborate to put together their knowledge in computational linguistics, phonetics, prosody, physiology, vocal tract modeling, signal processing, image synthesis, experimental psychology, ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملThe Effect of Using PRAAT Software on Pre-Intermediate EFL Learners’ Supra Segmental Features
The present study investigated the effect of using PRAAT as a free computer software package for the scientific analysis of speech in phonetics on pre-intermediate Iranian English as foreign language (EFL) learners’ supra segmental features (i.e., intonation and stress). The design of the study was a Quasi-experimental research design with a pre and post-test. In doing so...
متن کاملA bag-of-features framework for incremental learning of speech invariants in unsegmented audio streams
We introduce a computational framework that allows a machine to bootstrap flexible autonomous learning of speech recognition skills. Technically, this framework shall enable a robot to incrementally learn to recognize speech invariants from unsegmented audio streams and with no prior knowledge of phonetics. To achieve this, we import the bag-of-words/bag-of-features approach from recent researc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000